mining and digital humanity issn
Efficient Toxicity Detection in Gaming Chats: A Comparative Study of Embeddings, Fine-Tuned Transformers and LLMs
Tereshchenko, Yehor, Hämäläinen, Mika
This paper presents a comprehensive comparative analysis of Natural Language Processing (NLP) methods for automated toxicity detection in online gaming chats. Traditional machine learning models with embeddings, large language models (LLMs) with zero-shot and few-shot prompting, fine-tuned transformer models, and retrieval-augmented generation (RAG) approaches are evaluated. The evaluation framework assesses three critical dimensions: classification accuracy, processing speed, and computational costs. A hybrid moderation system architecture is proposed that optimizes human moderator workload through automated detection and incorporates continuous learning mechanisms. The experimental results demonstrate significant performance variations across methods, with fine-tuned DistilBERT achieving optimal accuracy-cost trade-offs. The findings provide empirical evidence for deploying cost-effective, efficient content moderation systems in dynamic online gaming environments.
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning
Kestemont, Mike, De Gussem, Jeroen
Especially in the community of Digital Humanities, the automated processing of Latin texts has always been a popular research topic. In a variety of computational applications, such as text reuse detection [Franzini et al, 2015], it is desirable to annotate and augment Latin texts with useful morpho-syntactical or lexical information, such as lemmas. In this paper, we will focus on two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization. Given a piece of Latin text, the task of lemmatization involves assigning each word to a single dictionary headword or'lemma': a baseform label (preferably in a normalized orthography) grouping all word tokens which only differ in spelling and/or inflection [Knowles et al, 2004]. The task of lemmatization is closely related to that of part-of-speech (PoS) tagging [Jurafsky et al, 2000], in which each word in a running text should be assigned a tag indicating its part of speech or word class (e.g.
- North America > United States > District of Columbia > Washington (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Switzerland (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.88)
Variable selection for clustering with Gaussian mixture models: state of the art
Talibi, Abdelghafour, Achchab, Boujemâa, Lasri, Rafik
SAA T Laboratory, University of Abdelmalek Essadi, FPL, Larache Morocco Corresponding author: Abdelghafour Talibi,a.talibi@uhp.ac.ma Abstract The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the model, making essential the selection of relevant variables for this type of clustering. After recalling the basics of clustering based on a model, this article will examine the variable selection methods for model-based clustering, as well as presenting opportunities for improvement of these methods. I INTRODUCTION Clustering aims to classify objects of a population in groups, where the objects in the same group are similar to each other, and the objects in different groups are dissimilar. Unlike the supervised classification where the number of groups is known in advance, at least for a sample, in the case of clustering, it is unknown how many groups and it remains to be estimated. In fact, many fields of research used clustering methods on the data, in order to obtain groups that allow understanding and interpreting the phenomenon studied.
- Africa > Middle East > Morocco (0.24)
- North America > United States > California > Alameda County > Berkeley (0.04)